Code
Insert code cell below
Ctrl+M B
Text
Add text cell
Notebook
Code Text

ψ

Part 0 - Completed

Part 1 - Completed

Part 2 - Completed

Part 3 - Completed

Part 4 - Completed

Code Text

ψ

Part 0: Critical Thinking

Problem Statement

What do you think is the average revenue of your favourite youtuber? Explain how did you conclude this answer and what was your approach towards this question in brief

Code Text

ψ

*SOLUTION *

For this part I have used the World Affairs Youtube channel

When it comes to predicting the average earnings of a YouTuber with a CPM of 1.50with1milliondailyviews.Onthebasisofthisdata,wecanpredictthatthisYouTuberwouldearnanywherebetween1,500.00 and $4,000.00 every day.

I have used the following method to determine the projected revenue for a YouTuber with 1 million daily views and a CPM of $1.50:

Estimated Daily Earnings = (Number of Daily Views / 1,000) x CPM

The calculation would therefore look like this if we entered the variables for a YouTuber with 1 million daily views and a CPM of $1.50:

Estimated Daily Earnings = (1,000,000 / 1,000) x 1.50=1,500.00

Now, we can estimate the YouTuber's monthly earnings, which range from 45,000.00to120,000.00, by taking these daily revenues and multiplying them by 30. And if we go one step further, we can get a projected annual income range of 540,000to1,440,000 if we multiply the monthly earnings by 12!

To calculate the estimated monthly earnings, we simply need to multiply the estimated daily earnings by 30:

Estimated Monthly Earnings = Estimated Daily Earnings * 30

Estimated Monthly Earnings = 1,500.0030=45,000.00

Now, to calculate the projected yearly earnings, we can multiply the estimated monthly earnings by 12:

Projected Yearly Earnings = Estimated Monthly Earnings x 12

Projected Yearly Earnings = 45,000.00x12=540,000.00

Code Text

ψ

Part 1: Descriptive Analysis

Code Text

import pandas as pd

Code Text


# Load the Excel file into a pandas dataframe

df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

# View the first five rows of the dataframe
print(df.head())
   Year  Month  Segment Region           KPI Value Type    Value
0  2020     12  Clients  India  Lv1_Visitors    Actuals  3665558
1  2020     12  Clients  India  Lv2_Visitors    Actuals  2689569
2  2020     12  Clients  India  Lv3_Visitors    Actuals  1300571
3  2020     12  Clients  India  Lv4_Visitors    Actuals   717608
4  2020     12  Clients  India  Lv3_Visitors    Actuals   706677
Code Text

ψ

Problem 1:

What type of company this dataset belongs to?

Solution :

According on the information in the dataset, it looks to be a funnel analysis dataset that is the property of a business involved in the e-commerce sector.

The dataset includes information on visitor counts at various phases of the customer journey, conversion rates, and income produced, from the first website visit through the final purchase stage.

Problem 2: Suppose that this dataset is for a website like Flipkart, what could be the possible definitions of the columns Level(visitors) 1, 2, 3, 4 and 5 in the given dataset? Do you observe any pattern?

Solution:

The possible definitions of the columns "Level (Visitors) 1, 2, 3, 4, and 5" could reflect the various stages of the customer journey if we suppose that this dataset is for a website like Flipkart.

Level 1 might correspond to site visitors who first arrived,

Level 2 to those who browsed particular product categories,

Level 3 to those who clicked on a particular product,

Level 4 to those who added items to their shopping cart, and

Level 5 to those who completed the checkout process.

Trend

From the data, we can see a trend in which the number of visitors gradually declines as we travel to higher levels, which is what is predicted by a funnel analysis. Additionally, we can see that the conversion rate tends to go up when we reach higher levels, showing that customers are more inclined to buy from website users who are more engaged with it. The necessity of optimizing the website's performance to maximize conversion rate and revenue production is highlighted by the fact that revenue generated also rises dramatically as we approach the funnel's final stages.

Code Text

ψ

Pivot view summary and suitable Visualization

Code Text

ψ

Problem 1

What was the total number of visitors segmented by each level, every month in each year?

Code Text

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Code Text

# Read the dataset into a pandas dataframe
df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')
df.head()
Code Text

df.Value
0       3665558
1       2689569
2       1300571
3        717608
4        706677
         ...   
1567       1693
1568       1428
1569       1311
1570       1071
1571        527
Name: Value, Length: 1572, dtype: int64
Code Text

# Create a pivot table to show the total number of visitors segmented by each level, every month in each year

pivot = pd.pivot_table(df, values='Value', index=['Year', 'Month'], columns=['Segment', 'KPI'], aggfunc=np.sum)

# View the pivot table
print(pivot)

Segment         Clients                                                      \
KPI        Lv1_Visitors Lv2_Visitors Lv3_Visitors Lv4_Visitors Lv5_Visitors   
Year Month                                                                    
2020 1          3322789      2304318      2410633      1157186       363939   
     2          2976450      2021915      2101659      1011351       318689   
     3          3453194      2334668      2440362      1186822       380754   
     4          4583672      3157506      3309406      1750651       555127   
     5          4703447      3333479      3393510     38408015       535507   
     6          4035885      2797449      2634588      1181799       415727   
     7          3586692      2461470      2225206       954541       335742   
     8          3133000      2125630      1876039       790587       270832   
     9          3377895      2296989      2052849       897110       311885   
     10         4286063      2923280      2720179      1147036       394290   
     11         3634761      2510682      2097387       938356       353697   
     12         3665558      2689569      2007248       897733       330491   
2021 1          3709579      2737312      2069173       929905       334932   
     2          3067802      2127445      1673076       732142       260215   
     3          3048799      1956176      1646093       739998       257440   
     4          3057604      1953667      1567928       714884       243135   
     5          3278319      2111235      1709112       782186       275921   
     6          2790598      1770074      1466886       642765       209801   
     7          2883471      1885690      1530813       730577       243400   
     8          3235262      2071017      1699258       813048       265349   
     9          3017046      1942594      1623241       752299       260747   
     10         3223253      2025461      1706380       780300       280499   
     11         3665788      2261094      1843438       887883       321797   
     12         3543797      2199420      1782771       813245       285523   
2022 1          5149212      3201562      2632792      1229833       459258   
     2          4614615      2826180      2301555      1076383       412729   
     3          4689717      2831029      2295964       948519       361298   
     4          4362681      2583815      2085700       956055       370116   
     5          4575236      2861299      2048233       909203       343327   
     6          4545397      2886077      2140205       932454       340917   
     7          4794170      3021946      2158995      1013750       365277   
     8          4602134      2851490      2151168       926916       341027   
     9          4522456      2790377      2287076       915374       326635   
     10         4705840      2894596      2227150       998514       375938   
     11         4857535      3089427      2352993      1118281       427054   
     12         4392558      2819845      2358706       988546       107913   

Segment       Customers                                                      
KPI        Lv1_Visitors Lv2_Visitors Lv3_Visitors Lv4_Visitors Lv5_Visitors  
Year Month                                                                   
2020 1           605252       460953       555356       235096        68562  
     2           634726       488635       610242       260685        70144  
     3           700476       549325       696667       303831        83111  
     4           952301       752792       887839       397094       108386  
     5           957269       784931       974202       439014       117812  
     6           894628       734666       880345       380616       103971  
     7           801056       651910       773498       322480        86563  
     8           717727       576340       659037       270785        70552  
     9           781782       630075       729504       309821        85514  
     10         1014962       828265       965466       417155       116507  
     11          879507       712544       783797       342597        98855  
     12          780674       615366       675758       295589        82591  
2021 1           889920       689741       739216       335905        96722  
     2           910358       725211       738563       349309       108845  
     3          1036446       847132       842369       396563       128086  
     4           836882       669841       700013       298319        84360  
     5           781835       623508       667727       268880        66645  
     6           721998       566659       599481       239613        57499  
     7           700108       552001       564444       231070        53886  
     8           659526       492179       461748       171445        35250  
     9           635646       452915       427475       143734        31027  
     10          616577       415019       363960       105240        27727  
     11          578423       396411       316443       115353        31779  
     12          629048       434953       311943       123228        29080  
2022 1           637719       410887       317661       117952        31744  
     2           595216       392437       311249       130017        32760  
     3           802788       536075       346418       126856        31683  
     4           646645       439196       295395       106921        27377  
     5           701577       458601       274222        83972        23640  
     6           724643       445181       260741        70733        20426  
     7           649114       434245       269701        85301        21294  
     8           634380       435715       280149        82156        23418  
     9           588864       411400       276388        80989        22965  
     10          595663       418393       296815        94824        24425  
     11          619135       444157       331189       109141        30289  
     12          587719       400044       286948        92106         7303  
Code Text

# Now, making the table more visualized so that it is easy to understand
# Create a heatmap to visualize the pivot table

plt.figure(figsize=(15, 8))
sns.heatmap(pivot, cmap='YlGnBu', annot=True, fmt=',.0f')
plt.title('Total Number of Visitors Segmented by Each Level, Every Month in Each Year')
plt.show()

Code Text

ψ

Problem 2

What is the percentage difference in the number of visitors between different regions and years?

Code Text

# To calculate the percentage difference we will use the pivot table 
pivot2 = pd.pivot_table(df, values='Value', index=['Year', 'Region'], columns='Segment', aggfunc='sum')
pivot2['Total'] = pivot2.sum(axis=1)
pivot2['Percent Diff'] = pivot2.groupby(['Region']).Total.pct_change()*100

Code Text

# Create a bar plot to visualize the pivot table to easily understand the data

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
pivot2['Percent Diff'].plot(kind='bar', ax=ax[0], title='Percent Difference (Before)')
pivot2.dropna()['Percent Diff'].plot(kind='bar', ax=ax[1], title='Percent Difference (After)')
plt.tight_layout()
plt.show()


Code Text


# Create a pivot table to aggregate the 'Value' column by the different columns
pivot_table = pd.pivot_table(df, index=['Year', 'Month', 'Segment', 'Region', 'KPI', 'Value Type'], values=['Value'], aggfunc=[sum, len])

# Print the pivot table
print(pivot_table)


                                                         sum   len
                                                       Value Value
Year Month Segment   Region KPI          Value Type               
2020 1     Clients   India  Lv1_Visitors Actuals     3322789     1
                            Lv2_Visitors Actuals     2304318     1
                            Lv3_Visitors Actuals     2410633     2
                            Lv4_Visitors Actuals     1157186     2
                            Lv5_Visitors Actuals      363939     2
...                                                      ...   ...
2022 12    Customers Ujjain Lv1_Visitors Actuals       18327     1
                            Lv2_Visitors Actuals       11600     1
                            Lv3_Visitors Actuals        6821     1
                            Lv4_Visitors Actuals        2556     1
                            Lv5_Visitors Actuals         293     1

[1500 rows x 2 columns]
Code Text

# Calculate the summary statistics for the 'Value' column
value_stats = df['Value'].describe()
print(value_stats)
count    1.572000e+03
mean     2.964329e+05
std      1.130364e+06
min      1.370000e+02
25%      9.247750e+03
50%      3.257450e+04
75%      1.796810e+05
max      3.698690e+07
Name: Value, dtype: float64
Code Text

import seaborn as sns

# Create a boxplot of the 'Value' column for each unique combination of the six columns
sns.boxplot(x='Year', y='Value', hue='KPI', data=df)

Code Text

ψ

Problem 3

Handle the outliers:

The Tukey method will be applied in this particular case to find and eliminate outliers.

Code Text

# Create a copy of the dataframe
df_copy = df.copy()

# Create a boxplot to visualize the distribution of values
plt.boxplot(df_copy['Value'])
plt.title('Boxplot of Value')
plt.show()
Code Text

Code Text

ψ

Part 2: Prescriptive Analysis

Code Text

ψ

Problem 1

● Transpose the data into a view as the reference given.

Approach :

using the pandas library we reads an Excel file called 'assignment.xlsx' and reads the 'Funnel' sheet into a Pandas DataFrame called df

After this the DataFrame is melted using the melt() function to convert the 'KPI' column, which contains the levels, into rows. The level number is extracted from the 'KPI' column using the str.extract() function and stored in a new 'Level' column. Then, a new DataFrame called pivoted_df is created by pivoting the melted DataFrame, so that there is one row for each unique combination of Year, Month, Segment, Region, and Value Type, and a new column is created for each level of visitors, using the pivot_table() function. The columns in pivoted_df are renamed to include the level number and the prefix "Lv" and "Visitors". Finally, the index of pivoted_df is reset to get the Year, Month, Segment, Region, and Value Type as columns in a new DataFrame called transpose_df. The head() function is used to preview the first five rows of transpose_df.

Code Text

Code Text

ψ

Problem 2

Assume you are a data analyst at Fittlyf company, analyze the data for the region performing worst in all the years and prescribe what could be the reason and how to improve the number of visitors from that region.

Code Text

ψ

Solution

To analyze the data for the region performing worst in all the years, we can follow these steps:

  1. By adding all the values in the columns Lv1_Visitors to Lv5_Visitors for each of the rows, we are able to calculate the total amount of visitors from each region.

  2. After that, we may organize the information by region and figure out how many people visited that region overall across all years.

  3. The region with the fewest visitors can then be found by ascending the data to find it.

  4. Once the area has been determined, we can further analyze the data to ascertain the causes of the poor performance. This could involve taking into consideration elements like the segment type, the value kind, the season, and any localized marketing or promotional initiatives.

Code Text

# By adding all the values in the columns Lv1_Visitors to Lv5_Visitors for each of the rows, we are able to calculate the total amount of visitors from each region.

# Select relevant columns
region_df = transpose_df[['Year', 'Month', 'Region', 'Lv1_Visitors', 'Lv2_Visitors', 'Lv3_Visitors', 'Lv4_Visitors', 'Lv5_Visitors']]

# Calculate total number of visitors from each region
region_df['Total_Visitors'] = region_df['Lv1_Visitors'] + region_df['Lv2_Visitors'] + region_df['Lv3_Visitors'] + region_df['Lv4_Visitors'] + region_df['Lv5_Visitors']

# Group data by region and calculate total number of visitors for each region across all years
region_visitors = region_df.groupby('Region')['Total_Visitors'].sum().reset_index()

# Sort data in ascending order to identify region with lowest number of visitors
worst_region = region_visitors.sort_values('Total_Visitors').iloc[0]['Region']



<ipython-input-25-d11952a02270>:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  region_df['Total_Visitors'] = region_df['Lv1_Visitors'] + region_df['Lv2_Visitors'] + region_df['Lv3_Visitors'] + region_df['Lv4_Visitors'] + region_df['Lv5_Visitors']
Code Text

worst_region
Code Text

ψ

Solution:

The data analysis reveals that Aurangabad has had the worst performance throughout all years. This region consistently receives less tourists than other locations.

We must examine the various visitor types and demographic groups in this area if we are to comprehend the cause of the poor performance. We can observe from the data that level 1 and level 2 visitors make up the majority of visitors in Aurangabad. At higher levels, there are fewer visits overall, with level 5 seeing the lowest number of visitors.

Possible reasons for this low performance could be a lack of awareness or marketing efforts targeted at higher level visitors, lack of infrastructure and amenities to cater to higher level visitors, or a general lack of interest in health and fitness in this region.

The business should concentrate on targeted marketing and awareness activities to draw in higher calibre guests in order to increase the number of visits from Aurangabad. To meet the demands of higher level tourists, they could also make investments in enhancing the facilities and services. Finally, they can think about doing research and analysis to comprehend the local market and adjust their product offerings.

Code Text

ψ

Problem 3

Based on the given data, identify which region is having a better YearOnYear growth.

Code Text

import pandas as pd

# Load the data into a dataframe
df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

# Group the data by Region and Year, and calculate the sum of Value for each group
grouped_df = df.groupby(['Region', 'Year'])['Value'].sum().reset_index()

# Calculate the YoY growth rate for each region
grouped_df['YoY Growth Rate'] = grouped_df.groupby('Region')['Value'].pct_change()

# Get the region with the highest YoY growth rate
best_region = grouped_df.sort_values(by='YoY Growth Rate', ascending=False)['Region'].iloc[0]

print("The region with the better YearOnYear growth is:", best_region)

The region with the better YearOnYear growth is: Dehradun
Code Text

ψ

The region with the better YearOnYear growth is:

  • Dehradun
Code Text

ψ

Problem 4

● Guess what could be the meaning of Level1 visitors, level 2 visitors up to level 5 and then answer the following-

  1. Create a new feature (Level 5 visitors/Level 1 visitors) and what are the top 3 states based on that created feature for all the available segments and each given year.
  2. Create any other metric apart from (Level5 visitors/Level1 visitors) and perform the same task as above. Compare if the states are same in both the questions, if they are different try to create a hypothesis about the reason behind it.
Code Text

ψ

Solution 1:-

It is possible that Level 1 visitors, Level 2 visitors, and Level 5 visitors all refer to distinct phases or levels of website visitors based on the naming convention. Each level could stand for a specific online activity or behaviour.

For instance, Level 1 visitors might be those who first arrived at the website, Level 2 visitors might be those who browsed particular product categories, Level 3 visitors might be those who clicked on a particular product, Level 4 visitors might be those who added items to their cart, and Level 5 visitors might be those who finished the checkout process.

Code Text

ψ

Solving 1

To create a new feature we must extract the data from the KPI column, which contains the levels in the form of Lv1_Visitors to Lv5_Visitors, in order to develop a new feature (Level 5 visitors/Level 1 visitors). Using regular expressions, we can accomplish this and then carry out the calculation.

Code Text

Year: 2020 , Segment: Clients
0    India
Name: Region, dtype: object


Year: 2020 , Segment: Customers
3    Faridabad
5       Uddepy
6       Ujjain
Name: Region, dtype: object


Year: 2021 , Segment: Clients
7    India
Name: Region, dtype: object


Year: 2021 , Segment: Customers
10     Faridabad
8     Aurangabad
13        Ujjain
Name: Region, dtype: object


Year: 2022 , Segment: Clients
16      Indore
14    Dehradun
18      Ujjain
Name: Region, dtype: object


Year: 2022 , Segment: Customers
23        Uddepy
19    Aurangabad
24        Ujjain
Name: Region, dtype: object


Code Text

ψ

The above output tells us the top 3 states for each year and every Segment

Code Text

ψ

Solution 2:-

In this case we will use the ratio of Level 4 Visitors to Level 1 Visitors to develop a new statistic.

Code Text

import pandas as pd

# Load the data
df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

# Calculate the percentage change in value from first month to last month for each segment and region
df['YearMonth'] = df['Year'].astype(str) + '-' + df['Month'].astype(str).str.zfill(2)
df['FirstMonth'] = df.groupby(['Segment', 'Region', 'Year'])['Value'].transform('first')
df['LastMonth'] = df.groupby(['Segment', 'Region', 'Year'])['Value'].transform('last')
df['Change'] = ((df['LastMonth'] - df['FirstMonth']) / df['FirstMonth']) * 100

# Pivot the data to have a column for each segment and a row for each region and year
df_pivot = df.pivot_table(index=['Region', 'Year'], columns='Segment', values='Change')

# Print the top 3 states for each year and segment based on the new metric
for year in df['Year'].unique():
    for segment in df['Segment'].unique():
        print(f'Top 3 states for {year} - {segment}')
        segment_data = df_pivot[segment].loc[:, year].dropna()
        segment_data_sorted = segment_data.sort_values(ascending=False)
        top_3_states = segment_data_sorted.head(3)
        print(top_3_states)
        print()

Top 3 states for 2020 - Clients
Region
India   -97.655773
Name: Clients, dtype: float64

Top 3 states for 2020 - Customers
Region
Uddepy       -83.740171
Aurangabad   -84.881839
Ujjain       -87.995754
Name: Customers, dtype: float64

Top 3 states for 2021 - Clients
Region
India   -97.84048
Name: Clients, dtype: float64

Top 3 states for 2021 - Customers
Region
Dehradun     -76.440596
Aurangabad   -77.763027
Ujjain       -78.872662
Name: Customers, dtype: float64

Top 3 states for 2022 - Clients
Region
Indore     -83.631722
Dehradun   -85.625803
Uddepy     -88.592267
Name: Clients, dtype: float64

Top 3 states for 2022 - Customers
Region
Uddepy       -91.099240
Aurangabad   -91.509586
Ujjain       -92.846620
Name: Customers, dtype: float64

Code Text

ψ

The above output tells us the top 3 states for each year and every Segment

Code Text

ψ

Part 3: Prediction

Code Text

ψ

Problem 1

Write a function called predict_future(‘Region’,’Segment’) which, when called, would perform the following activity:

  1. Predict “Level 5” future values for the next 6 months, given the parameters of the function. (Please make sure the parameters have default values in place) Also, plot it.
Code Text

!pip install prophet


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: prophet in /usr/local/lib/python3.10/dist-packages (1.1.2)
Requirement already satisfied: convertdate>=2.1.2 in /usr/local/lib/python3.10/dist-packages (from prophet) (2.4.0)
Requirement already satisfied: tqdm>=4.36.1 in /usr/local/lib/python3.10/dist-packages (from prophet) (4.65.0)
Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.10/dist-packages (from prophet) (1.22.4)
Requirement already satisfied: LunarCalendar>=0.0.9 in /usr/local/lib/python3.10/dist-packages (from prophet) (0.0.9)
Requirement already satisfied: matplotlib>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from prophet) (3.7.1)
Requirement already satisfied: python-dateutil>=2.8.0 in /usr/local/lib/python3.10/dist-packages (from prophet) (2.8.2)
Requirement already satisfied: pandas>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from prophet) (1.5.3)
Collecting cmdstanpy>=1.0.4
  Downloading cmdstanpy-1.1.0-py3-none-any.whl (83 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.2/83.2 kB 4.1 MB/s eta 0:00:00
Requirement already satisfied: holidays>=0.14.2 in /usr/local/lib/python3.10/dist-packages (from prophet) (0.23)
Requirement already satisfied: pymeeus<=1,>=0.3.13 in /usr/local/lib/python3.10/dist-packages (from convertdate>=2.1.2->prophet) (0.5.12)
Requirement already satisfied: korean-lunar-calendar in /usr/local/lib/python3.10/dist-packages (from holidays>=0.14.2->prophet) (0.3.1)
Requirement already satisfied: hijri-converter in /usr/local/lib/python3.10/dist-packages (from holidays>=0.14.2->prophet) (2.3.1)
Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from LunarCalendar>=0.0.9->prophet) (2022.7.1)
Requirement already satisfied: ephem>=3.7.5.3 in /usr/local/lib/python3.10/dist-packages (from LunarCalendar>=0.0.9->prophet) (4.1.4)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (4.39.3)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (8.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (1.0.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (23.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (3.0.9)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->prophet) (1.4.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.0->prophet) (1.16.0)
Installing collected packages: cmdstanpy
  Attempting uninstall: cmdstanpy
    Found existing installation: cmdstanpy 0.9.5
    Uninstalling cmdstanpy-0.9.5:
      Successfully uninstalled cmdstanpy-0.9.5
Successfully installed cmdstanpy-1.1.0
Code Text

import pandas as pd
from prophet import Prophet

df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1572 entries, 0 to 1571
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Year        1572 non-null   int64 
 1   Month       1572 non-null   int64 
 2   Segment     1572 non-null   object
 3   Region      1572 non-null   object
 4   KPI         1572 non-null   object
 5   Value Type  1572 non-null   object
 6   Value       1572 non-null   int64 
dtypes: int64(3), object(4)
memory usage: 86.1+ KB
Code Text

Code Text

ψ

Testing


# Test the function with Region1 and Segment1

result = predict_future('Dehradun', 'Clients')
print(result)

# Test the function with Region2 and Segment3
result = predict_future('Ujjain', 'Customers')
print(result)



Code Text

ψ

Problem 2

Generates the MAPE and RMSE of your prediction of the year 2022, 2021 & 2020 for the given parameters.

Code Text

import pandas as pd
from prophet import Prophet
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error

def predict_future(region='Region1'segment='Segment1'):
    # Load the dataset
    df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

    # Filter the dataset based on the provided Region and Segment
    df = df[(df['Region'] == region) & (df['Segment'] == segment)]

    # Remove any rows with NaN/Null values in the 'Value' column
    df = df.dropna(subset=['Value'])

    # Convert the 'Year' and 'Month' columns into a datetime format
    df['ds'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'].astype(str))

    # Create a new DataFrame with the filtered data and the 'Year' and 'Month' columns as the index
    df = df.set_index(['Year', 'Month'])

    # Rename the 'Value' column to 'y'
    df = df.rename(columns={'Value': 'y'})

    # Create a new Prophet model and fit it to the data
    model = Prophet()
    model.fit(df)

    # Use the `make_future_dataframe()` function to create a DataFrame with the date range for the next 3 years
    future = model.make_future_dataframe(periods=36, freq='M')

    # Use the `predict()` function of the fitted model to predict the 'Level 5' values for the next 3 years
    forecast = model.predict(future)

    # Filter the predictions for the years 2020, 2021, and 2022
    predictions = forecast[forecast['ds'].dt.year.isin([2020, 2021, 2022])][['ds', 'yhat']].reset_index(drop=True)

    # Calculate the actual values for the years 2020, 2021, and 2022
    actual = df[df.index.get_level_values('Year').isin([2020, 2021, 2022])]['y'].reset_index(drop=True)

    # Calculate the MAPE and RMSE for the predictions and actual values
    mape = np.mean(np.abs((actual - predictions['yhat']) / actual)) * 100
    rmse = np.sqrt(np.mean((actual - predictions['yhat']) ** 2))

    # Return the predictions, actual values, and the MAPE and RMSE
    return {'Predictions': predictions, 'Actual': actual, 'MAPE': mape, 'RMSE': rmse}


Code Text

ψ

Testing

Code Text

result = predict_future(region='Aurangabad', segment='Customers')
print(result['MAPE'])  # Prints the MAPE
print(result['RMSE'])  # Prints the RMSE

INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmptdxzkjmb/ley2m8vo.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmptdxzkjmb/fxmrjkz0.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=21634', 'data', 'file=/tmp/tmptdxzkjmb/ley2m8vo.json', 'init=/tmp/tmptdxzkjmb/fxmrjkz0.json', 'output', 'file=/tmp/tmptdxzkjmb/prophet_modelpbbzibf4/prophet_model-20230506173550.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:35:50 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:35:50 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing
235.43431334790165
2917.182063764239
Code Text

ψ

Problem 3

Plot a line graph of the level 5 actual numbers from 2020-2022 & in the same graph, there should be the predicted numbers for 2023. The x-axis should be the timeline from 2020 Jan to 2023 Jun and the y-axis should be the value of the level 5 column and predicted values.

Code Text

!pip install ipywidgets
!jupyter nbextension enable --py widgetsnbextension

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (7.7.1)
Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (0.2.0)
Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (7.34.0)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (3.6.4)
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (5.5.6)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (3.0.7)
Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (5.7.1)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets) (6.1.12)
Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets) (6.2)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (3.0.38)
Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (2.14.0)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (4.8.0)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (67.7.2)
Collecting jedi>=0.16
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 16.2 MB/s eta 0:00:00
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.1.6)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (4.4.2)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.10/dist-packages (from widgetsnbextension~=3.6.0->ipywidgets) (6.4.8)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.8.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (3.1.2)
Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (5.8.0)
Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.8.0)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (21.3.0)
Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.17.1)
Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.5.6)
Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (5.3.0)
Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.2.1)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (6.5.4)
Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.16.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.10/dist-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (2.8.2)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.6)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core>=4.6.1->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (3.3.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.1->jupyter-client->ipykernel>=4.5.1->ipywidgets) (1.16.0)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (21.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.1.2)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.4)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.7.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.8.4)
Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (6.0.0)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.7.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.5.0)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.2.1)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.9.2)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.11.2)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.1)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.2.2)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.3.3)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.16.3)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.19.3)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.1.0)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.15.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.4.1)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.5.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.21)
Installing collected packages: jedi
Successfully installed jedi-0.18.2
Enabling notebook extension jupyter-js-widgets/extension...
Paths used for configuration of notebook: 
    	/root/.jupyter/nbconfig/notebook.json
Paths used for configuration of notebook: 
    	
      - Validating: OK
Paths used for configuration of notebook: 
    	/root/.jupyter/nbconfig/notebook.json
Code Text

import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt

def predict_future(region='Region1'segment='Segment1'):
    # Load the dataset
    df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/assignment.xlsx', sheet_name='Funnel')

    # Filter the dataset based on the provided Region and Segment
    df = df[(df['Region'] == region) & (df['Segment'] == segment)]

    # Remove any rows with NaN/Null values in the 'Value' column
    df = df.dropna(subset=['Value'])

    # Convert the 'Year' and 'Month' columns into a datetime format
    df['ds'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'].astype(str))

    # Create a new DataFrame with the filtered data and the 'Year' and 'Month' columns as the index
    df = df.set_index(['Year', 'Month'])

    # Rename the 'Value' column to 'y'
    df = df.rename(columns={'Value': 'y'})

    # Create a new Prophet model and fit it to the data
    model = Prophet()
    model.fit(df)

    # Use the `make_future_dataframe()` function to create a DataFrame with the date range for the next 6 months
    future = model.make_future_dataframe(periods=18, freq='M')

    # Use the `predict()` function of the fitted model to predict the 'Level 5' values for the next 6 months
    forecast = model.predict(future)

    # Plot the predicted values using the `plot()` function of the `matplotlib` library
    plt.plot(df['ds'], df['Level5 Visitors'], label='Actual')
    plt.plot(forecast['ds'], forecast['yhat'], label='Predicted')

    # Set the labels for the x and y axes
    plt.xlabel('Timeline')
    plt.ylabel('Level 5 Visitors')

    # Set the title for the graph
    plt.title('Actual vs Predicted Level 5 Visitors')

    # Add a legend to the graph
    plt.legend()

    # Display the graph
    plt.show()

    # Return the predicted values for the next 18 months
    return forecast[['ds', 'yhat']].tail(18)

Code Text

ψ

Part 4: A/B testing

Code Text

ψ

Problem 1

Using “AB_TEST” sheet in the shared excel file, what is the possible metric you can create for A/B testing excluding no. of clicks and no. of visitors.

Code Text

ψ

The metric that can be created for A/B testing excluding no. of clicks and no. of visitors are as follows:

  1. Conversion rate: This is the proportion of visitors who complete an intended action, such buying something or filling out a form. By dividing the quantity of conversions by the quantity of visits, it may be estimated.

  2. Revenue per user: This metric considers the typical sum of money earned by each user. By dividing the total revenue by the number of users, it may be computed.

  3. Engagement rate: It is an indicator of user engagement with a website . It can be computed by dividing the total number of visitors by the number of active users.

  4. Bounce rate: This statistic shows the proportion of site visitors who leave after just seeing one page. You may figure it out by dividing the total number of visits by the number of one-page visits.

  5. Average session duration: This indicator examines how long visitors typically stay on a website. We can Compute it by dividing the total time spent on site by the number of sessions.

Code Text

ψ

Problem 2

Perform an AB testing to find which variation whether control or treatment is better.

Code Text

ψ

Solution :

To perform AB testing, we can define the null hypothesis as "there is no difference in the conversion rate between the control and treatment groups" and the alternative hypothesis as "there is a statistically significant difference in the conversion rate between the control and treatment groups".

AB testing is commonly used to examine the success rate of two variations and identify which is superior. Based on the given quantity of clicks and visitors, we want to decide which of two variations the control and treatment is more effective in this situation.

Code Text

Clicks p-value: 2.4230205257275484e-20
Visitors p-value: 1.2501424893255777e-17
Control is the winning variation
Code Text

Code Text